design – File parsing in UI Layer or Application Services Layer

Let’s say that I have a list of financial transactions that I need to read in from the file. I want to make the best guess I can at what account should be credited/debited based on the transaction memo compared to past transactions.

For example, if Wal-Mart was used with ‘Shopping’ then if a transaction that gets read in from the file with Wal-Mart as the description should show ‘Shopping’. If there cannot be a match found, then the application should make the best guess and get feedback from the user. If there is not a best match then the user should be asked which account makes most sense.

To me, there is a lot of interaction with the user so it would make sense that this should all live in the UI layer. Once all the transactions are paired with accounts, then it should be sent to the Application Service layer to be saved.

Right now I’m just using a CLI, so I could inject an object that inherits from a ‘Presenter’ interface that the Application Service uses; however, this will not work when I get rid of the CLI and want to use a REST API around the Application Service layer.

Does it make sense to just include all this logic in the UI layer?

I am counfused about when Oracle database won’t do parsing

I am confused about when Oracle database won’t do parsing?
In the AWR report, there is a metrics called “execute to parse”, which means more SQL just execute without parsing when it increases.
But as the Oracle document describe:
“When an application issues a SQL statement, the application makes a parse call to the database to prepare the statement for execution. ”
It seems that everytime a SQL statement is issued, parsing will be called.
So I wandering when Oracle won’t do parsing and make the “execute to parse” become a larger number?
Or I just misunderstood?

rust – Parsing “my lhs = my rhs”

Just below is a nom parser which can parse one-line expressions like my lhs = my rhs:

Cargo.toml

(package)
name = "basic-test"
version = "0.0.0"
edition = "2018"

(dependencies)
nom = "6.1.2"

src/main.rs

/// Nom shortcuts.
mod nom_prelude {
    pub use nom::{
        IResult,
        combinator::{ map, recognize },
        multi::{ many0, many1 },
        sequence::{ terminated, separated_pair },
    };

    pub mod complete {
        pub use nom::{
            bytes::complete::tag,
            character::complete::{ anychar, none_of, space0 },
        };
    }
}

/// Parsing stuff.
mod parse {
    use super::nom_prelude::{*, complete::*};

    // TYPES/TRAITS DEFINITIONS AROUND NOM (treating inputs as &str):

    /// The nom input type.
    type ParseInput<'a> = &'a str;

    /// The nom result type, considering the defined input type.
    type ParseResult<'a, R> = IResult<ParseInput<'a>, R>;

    /// The type of a nom parser, considering the defined input type.
    trait Parser<'a, R>: FnMut(ParseInput<'a>) -> ParseResult<'a, R> {}
    impl<'a, R, F> Parser<'a, R> for F
        where F: FnMut(ParseInput<'a>) -> ParseResult<'a, R>
    {}

    // PARSE FUNCTIONS (HOW TO IMPROVE THE FOLLOWING CODE?):

    /// Parses an equal statement, like "FOO = BAR".
    pub fn equal_statement<'a>(input: ParseInput<'a>) -> ParseResult<(&'a str, &'a str)> {
        separated_pair(
            equal_statement_lhs,
            terminated(tag("="), space0),
            equal_statement_rhs,
        )
        (input)
    }

    /// Parses the left-hand side of an equal statement.
    fn equal_statement_lhs<'a>(input: ParseInput<'a>) -> ParseResult<&'a str> {
        map(
            recognize(
                many1(none_of("="))
            ),
            |s: &'a str| s.trim()
        )
        (input)
    }

    /// Parses the right-hand side of an equal statement (the remaining characters of the input).
    fn equal_statement_rhs<'a>(input: ParseInput<'a>) -> ParseResult<&'a str> {
        recognize(many0(anychar))
        (input)
    }
}

fn main() {
    let input = "my shortcut  =   /some/path";
    let parsed = parse::equal_statement(input);
    println!("{:?}", parsed);
}

Output

Ok(("", ("my shortcut", "/some/path")))

With the given code, the parsing is working like this:

  1. Take all characters until you meet an equal sign “=”;
  2. Right trim the taken chars and store the result as the LHS;
  3. Skip the equal sign “=” and the following spaces (pattern =(SPACES*));
  4. Take all the remaining characters;
  5. Store the taken chars as the RHS.

I want the code to be simplified if possible, and to behave like this instead:

  1. Take all characters until you meet the pattern (SPACES*)=(SPACES*);
  2. Store the taken chars as the LHS;
  3. Skip the already parsed equal sign pattern (SPACES*)=(SPACES*);
  4. Take all the remaining characters;
  5. Store the taken chars as the RHS.

How can I improve/simplify the code ?

  • especially the part commented with // PARSE FUNCTIONS (HOW TO IMPROVE THE FOLLOWING CODE?);
  • I don’t like the use of trim(), which looks like a doubloon (parsing something which has already been parsed).

parsers – Parsing HTML from HTTP vs HTTPS websites


Your privacy


By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.




java – Parsing Semver versions

I have written my own take on semantic versioning. Parsing it is not really hard, but I feel like my parsing could be more optimal, more readable and feel more like a parser. Currently, there is this unread method that I don’t see in most parser so if possible I would like to get rid of it, and the two methods readPrerelease() and readBuild() feel too complex.

I’m only interested in parsing so my Version class got cleaned of equals, hashCode, compareTo and toString methods and I removed the related tests. If required I could re-add them, but to me that is superfluous in this code review request.

For this code review, I would like to:

  • Make the code more readable
  • Let the code go forward and avoid going backwards (remove unread and all those currentPosition - 1), unless necessary.
  • Avoid having so many booleans in the methods readPreRelease() and readBuild().
  • Write general remarks about the code if any.

My code provides three classes:

  • the VersionParser which do the actual parsing.
  • the Version class, which was dumbed down because I don’t want that to be code-reviewed, it’s rather easy but the goal here is for the parser. The parsing entry point is here, through the valueOf method.
  • the testing class I used to make sure my parsing is correct.

Below those classes, you can see the BNF grammar for reference.

Please note that I removed comments, as I want my code to be self-explanatory, so if it’s unclear, that something that should be factored in the review.

import java.util.ArrayList;
import java.util.List;

import static java.util.Objects.requireNonNull;

final class VersionParser {

  private final String source;
  private int currentPosition = 0;

  VersionParser(String source) {
    this.source = requireNonNull(source);
  }

  Version parse() {
    var major = readNumericIdentifier();
    consume('.');
    var minor = readNumericIdentifier();
    consume('.');
    var patch = readNumericIdentifier();
    var preRelease = List.<String>of();
    if (peek() == '-') {
      consume('-');
      preRelease = readPreReleases();
    }
    var build = List.<String>of();
    if (peek() == '+') {
      consume('+');
      build = readBuilds();
    }
    check(isAtEnd(), "Unexpected characters in "%s" at %d", source, currentPosition - 1);
    return new Version(major, minor, patch, preRelease, build);
  }

  private boolean isDigit(int c) {
    return '0' <= c && c <= '9';
  }

  private boolean isAlpha(int c) {
    return ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z');
  }

  private boolean isNonDigit(int c) {
    return isAlpha(c) || c == '-';
  }

  private boolean isAtEnd() {
    return currentPosition >= source.length();
  }

  private int advance() {
    var c = source.charAt(currentPosition);
    currentPosition++;
    return c;
  }

  private int peek() {
    if (isAtEnd()) {
      return -1;
    }
    return source.charAt(currentPosition);
  }

  private void unread() {
    currentPosition--;
  }

  private void check(boolean expression, String messageFormat, Object... arguments) {
    if (!expression) {
      var message = String.format(messageFormat, arguments);
      throw new IllegalArgumentException(message);
    }
  }

  private void consume(char expected) {
    check(!isAtEnd(), "Early end in "%s"", source);
    var c = advance();
    check(c == expected, "Expected %c, got %c in "%s" at position %d", expected, c, source, currentPosition - 1);
  }

  private int readNumericIdentifier() {
    check(!isAtEnd(), "Early end in "%s"", source);
    var start = currentPosition;
    var c = advance();
    check(isDigit(c), "Expected a digit, got %c in "%s" at position %d", c, source, currentPosition - 1);
    if (c == '0') {
      return 0;
    }
    while (!isAtEnd()) {
      c = advance();
      if (!isDigit(c)) {
        unread();
        break;
      }
    }
    var string = source.substring(start, currentPosition);
    return Integer.parseInt(string);
  }

  private List<String> readPreReleases() {
    var preReleases = new ArrayList<String>();
    preReleases.add(readPreRelease());
    while (true) {
      if (peek() != '.') {
        return preReleases;
      }
      consume('.');
      preReleases.add(readPreRelease());
    }
  }

  /*
   * Basically, should be a valid number (without leading 0, unless for 0) or should contain at least one letter or dash.
   */
  private String readPreRelease() {
    var start = currentPosition;
    var isAllDigit = true;
    var startsWithZero = false;
    var isEmpty = true;
    while (!isAtEnd()) {
      var c = advance();
      var isDigit = isDigit(c);
      var isNonDigit = isNonDigit(c);
      if (!isDigit && !isNonDigit) {
        unread();
        break;
      }
      if (isEmpty && c == '0') {
        startsWithZero = true;
      }
      isEmpty = false;
      isAllDigit &= isDigit;
    }
    check(!isEmpty, "Empty preRelease part in "%s" at %d", source, currentPosition - 1);
    var length = currentPosition - start;
    var doesNotStartWithZero = !isAllDigit || !startsWithZero || length == 1;
    check(doesNotStartWithZero, "Numbers may not start with 0 except 0 in "%s" at position %d", source, start);
    return source.substring(start, currentPosition);
  }

  private List<String> readBuilds() {
    var builds = new ArrayList<String>();
    builds.add(readBuild());
    while (true) {
      if (peek() != '.') {
        return builds;
      }
      consume('.');
      builds.add(readBuild());
    }
  }

  private String readBuild() {
    var start = currentPosition;
    var isEmpty = true;
    while (!isAtEnd()) {
      var c = advance();
      var isDigit = isDigit(c);
      var isNonDigit = isNonDigit(c);
      if (!isDigit && !isNonDigit) {
        unread();
        break;
      }
      isEmpty = false;
    }
    check(!isEmpty, "Empty build part in "%s" at %d", source, currentPosition - 1);
    return source.substring(start, currentPosition);
  }

}

The Version class that hides the parser.

import java.util.List;

public final class Version {

  public static Version valueOf(String s) {
    return new VersionParser(s).parse();
  }

  private final int major;
  private final int minor;
  private final int patch;
  private final List<String> preRelease;
  private final List<String> build;

  Version(int major, int minor, int patch, List<String> preRelease, List<String> build) {
    this.major = major;
    this.minor = minor;
    this.patch = patch;
    this.preRelease = List.copyOf(preRelease);
    this.build = List.copyOf(build);
  }

  // getters, equals, hashCode, toString, compareTo (+ implement Comparable)
  
}

The test class to make sure the parsing works. Requires Junit and AssertJ.

import org.junit.jupiter.params.*;
import org.junit.jupiter.params.provider.*;

import java.util.*;

import static java.util.stream.Collectors.toList;
import static org.assertj.core.api.Assertions.*;

class VersionTest {

  @ParameterizedTest
  @MethodSource("provideCorrectVersions")
  void testVersion_correct(String correctVersion) {
    assertThat(be.imgn.common.base.Version.valueOf(correctVersion))
        .isNotNull();
  }

  private static List<Arguments> provideCorrectVersions() {
    var versions = new String() {
        "0.0.4", "1.2.3", "10.20.30", "1.1.2-prerelease+meta", "1.1.2+meta",
        "1.1.2+meta-valid", "1.0.0-alpha", "1.0.0-beta", "1.0.0-alpha.beta",
        "1.0.0-alpha.beta.1", "1.0.0-alpha.1", "1.0.0-alpha0.valid",
        "1.0.0-alpha.0valid",
        "1.0.0-alpha-a.b-c-somethinglong+build.1-aef.1-its-okay",
        "1.0.0-rc.1+build.1", "2.0.0-rc.1+build.123", "1.2.3-beta",
        "10.2.3-DEV-SNAPSHOT", "1.2.3-SNAPSHOT-123", "1.0.0", "2.0.0", "1.1.7",
        "2.0.0+build.1848", "2.0.1-alpha.1227", "1.0.0-alpha+beta",
        "1.2.3----RC-SNAPSHOT.12.9.1--.12+788",
        "1.2.3----R-S.12.9.1--.12+meta", "1.2.3----RC-SNAPSHOT.12.9.1--.12",
        "1.0.0+0.build.1-rc.10000aaa-kk-0.1",
        "999999999.999999999.999999999", "1.0.0-0A.is.legal"
    };
    return Arrays.stream(versions)
        .map(Arguments::of)
        .collect(toList());
  }

  @ParameterizedTest
  @MethodSource("provideIncorrectVersions")
  void testVersion_incorrect(String incorrectVersion) {
    assertThatThrownBy(() -> Version.valueOf(incorrectVersion))
        .isInstanceOf(IllegalArgumentException.class)
        .hasNoSuppressedExceptions();
  }

  private static List<Arguments> provideIncorrectVersions() {
    var versions = new String() {
        "1", "1.2", "1.2.3-0123", "1.2.3-0123.0123", "1.1.2+.123", "1.2.3+",
        "+invalid", "-invalid", "-invalid+invalid", "-invalid.01", "alpha",
        "alpha.beta", "alpha.beta.1", "alpha.1", "alpha+beta", "alpha_beta",
        "alpha.", "alpha..", "beta", "1.0.0-alpha_beta", "-alpha.",
        "1.0.0-alpha..", "1.0.0-alpha..1", "1.0.0-alpha...1",
        "1.0.0-alpha....1", "1.0.0-alpha.....1", "1.0.0-alpha......1",
        "1.0.0-alpha.......1", "01.1.1", "1.01.1", "1.1.01", "1.2",
        "1.2.3.DEV", "1.2-SNAPSHOT",
        "1.2.31.2.3----RC-SNAPSHOT.12.09.1--..12+788", "1.2-RC-SNAPSHOT",
        "-1.0.3-gamma+b7718", "+justmeta", "9.8.7+meta+meta",
        "9.8.7-whatever+meta+meta",
        "999999999999999999.999999999999999999.999999999999999999",
        "999999999.999999999.999999999----RC-SNAPSHOT.12.09.1-------------..12"
    };
    return Arrays.stream(versions)
        .map(Arguments::of)
        .collect(toList());
  }

}

The Backus–Naur Form grammar, as taken from the semver.org website.

<valid semver> ::= <version core>
                 | <version core> "-" <pre-release>
                 | <version core> "+" <build>
                 | <version core> "-" <pre-release> "+" <build>

<version core> ::= <major> "." <minor> "." <patch>

<major> ::= <numeric identifier>

<minor> ::= <numeric identifier>

<patch> ::= <numeric identifier>

<pre-release> ::= <dot-separated pre-release identifiers>

<dot-separated pre-release identifiers> ::= <pre-release identifier>
                                          | <pre-release identifier> "." <dot-separated pre-release identifiers>

<build> ::= <dot-separated build identifiers>

<dot-separated build identifiers> ::= <build identifier>
                                    | <build identifier> "." <dot-separated build identifiers>

<pre-release identifier> ::= <alphanumeric identifier>
                           | <numeric identifier>

<build identifier> ::= <alphanumeric identifier>
                     | <digits>

<alphanumeric identifier> ::= <non-digit>
                            | <non-digit> <identifier characters>
                            | <identifier characters> <non-digit>
                            | <identifier characters> <non-digit> <identifier characters>

<numeric identifier> ::= "0"
                       | <positive digit>
                       | <positive digit> <digits>

<identifier characters> ::= <identifier character>
                          | <identifier character> <identifier characters>

<identifier character> ::= <digit>
                         | <non-digit>

<non-digit> ::= <letter>
              | "-"

<digits> ::= <digit>
           | <digit> <digits>

<digit> ::= "0"
          | <positive digit>

<positive digit> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J"
           | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T"
           | "U" | "V" | "W" | "X" | "Y" | "Z" | "a" | "b" | "c" | "d"
           | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n"
           | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x"
           | "y" | "z"
````

compilers – $text{“The error entries in the goto table are never consulted”}$ – intuitive explanation of the above claim with respect to $LR$ parsing tables

I was going through the text : Compilers: Principles, Techniques and Tools by Ullman et. al where I came across a claim:

$text{“The error entries in the goto table are never consulted”}$

I feel that whenever there is a reduce move, the current state pointer moves some states back. Suppose that $Arightarrow alpha$ is the production used and for the viable prefix $gammaalpha$ the current state pointer moves back those many states (so as to pop the $alpha$ from the stack) and goes to state say $I_n$ and there is a move from $I_n$ on $A$ to some state $I_{n’}$.

Why does PostgreSQL accept junk when parsing an INTERVAL

psql (11.8)
Type "help" for help.

public=# select '!@#$%^&*()00:00:00.01<>?<>:'::interval;
  interval
-------------
 00:00:00.01
(1 row)

Seems to take just about anything… I don’t see anything in the docs about this…

It’s kind of cool but also worrisome… can I rely on say '__00:00:00.01__' ALWAYS working?

And what are the REAL rules – the ones in the docs are not complete apparently…

python 3.x – After parsing, it is not possible to form data in the dictionary in a certain format

Help me understand how to compose an algorithm.
I take tags <p>, <ul> in the source, trying to insert a list with the key ‘ParametersVariants’

part of html from source

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
<tbody>

<tr>
    <td colspan="1" class="confluenceTd">
        <div class="content-wrapper"><p>1100a, 1100b, 1100c</p>
            </div>
    </td>
    <td colspan="1" class="confluenceTd">Увидел экран "Весь портфель"</td>
    <td colspan="1" class="confluenceTd"><br></td>
    <td colspan="1" class="confluenceTd">11000_Portf_Group_Show</td>
    <td colspan="1" class="confluenceTd"><p><span style="color: rgb(0,0,0);"><strong><u style="text-align: left;">ListAccID:&nbsp;</u></strong>перечень ID счетов, по которым ВКЛЮЧЕНЫ активы через&nbsp;, (запятую) в квадратных скобках, каждое значение в кавычках. Например,&nbsp;("321322","432432")</span>
    </p>
        <p><s><u><strong>AcсType:</strong></u></s>&nbsp;Не передаем, так как тут показывают весь портфель&nbsp;по всем
            счетам</p>
        <ul>
            <li><strong>Основной</strong></li>
            <li><strong>ИИС</strong></li>
        </ul>
        <p><u><strong>Section: </strong></u>(Где находится)</p>
        <ul>
            <li><strong>Position&nbsp;</strong>- Активы, позиции&nbsp;(экран 1102)</li>
            <li><strong>Order&nbsp;</strong>- Заявки (экран 1102)</li>
            <li><strong>History</strong> - История (экран 1501/экран 1601)</li>
            <li><s><strong>Operation</strong></s> - Операции</li>
        </ul>
        <p><span
                style="color: rgb(255,0,0);">--Так мы записываем комментарии, которые не являются доп параметром.</span>
        </p></td>
    <td colspan="1" class="confluenceTd"><br></td>
    <td colspan="1" class="confluenceTd"><br></td>
    <td colspan="1" class="confluenceTd"><br></td>
    <td colspan="1" class="confluenceTd"><br></td>
</tr>
</tbody>
</body>
</html>

as a result, an extra line – ‘ParameterName’ and an empty list ‘ParametersVariants’ are duplicated,
but the list should be opposite the corresponding ‘ParameterName’ parameter, and not on a separate line

(out):

{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'ListAccID:', 'ParametersVariants': ()}
{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'AcсType:', 'ParametersVariants': ()}
{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'AcсType:', 'ParametersVariants': ('Основной', 'ИИС')}
{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'Section: ', 'ParametersVariants': ()}
{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'Section: ', 'ParametersVariants': ('Position', {'EventName': '11000_Portf_Group_Show', 'ParameterName': 'Section: ', 'ParametersVariants': ()}

should be

{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'ListAccID:', 'ParametersVariants': ()}
{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'AcсType:', 'ParametersVariants': ('Основной', 'ИИС')}
{'EventName': '11000_Portf_Group_Show', 'ParameterName': 'Section: ', 'ParametersVariants': ('Position', 'Order', 'History ', 'Operation ')}

my code

# -*- coding: UTF-8 -*-

import csv
import os
import pprint
import re
from bs4 import BeautifulSoup

tbody_2 = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tbody_2.html')


def prepare_events():
    html_part = open(tbody_2, 'r', encoding='utf-8').read()
    soup = BeautifulSoup(html_part, features='html.parser')
    rows = soup.find_all('tr')
    data = {}
    for row in rows:
        try:
            td = row.find_all(('th', 'td'))
            event_name = cleanhtml(td(3)).strip()
            data('EventName') = event_name

            tags = td(4).find_all(('p', 'ul'))
            for tag in tags:
                param_name = tag.find('u')
                if param_name:
                    data('ParameterName') = cleanhtml(param_name)

                data('ParametersVariants') = (cleanhtml(li).split('-')(0) for li in tag.find_all('li'))
                # data('ParametersVarDescription') = (cleanhtml(li).split('-')(1) for li in tag.find_all('li'))
                print(data)


        except IndexError:
            pass


def cleanhtml(html):
    return re.compile('<.*?>').sub('', str(html)).replace('xa0', '')


if __name__ == '__main__':
    prepare_events()

Sorry for my google language!

python 3.x – Improving speed of parsing XML file

import pandas as pd
import pandas_read_xml as pdx
from lxml import etree
url2='http:'
url='http:'
tree=etree.parse(url)
st=etree.tostring(tree.getroot())

# The list "all_leaves" includes every node and leaf in the tree

from io import StringIO, BytesIO
#st above is encoded as binary

all_leaves=()
events = ("start", "end")
# "start" are the xml tags start
# "end" are the xml tags with "</"
parent='root' #root of the xml tree
context = etree.iterparse(BytesIO(st), events=events)
for n, actel in enumerate(context):
  if actel(0)=='end': #this marks the end of a node
    parent = all_leaves(n-1)(1) #use previous parent
    all_leaves.append((actel(0),parent,actel(1).tag))
    #append the action, parent and tag
  else:
    all_leaves.append((actel(0),parent,actel(1).tag))
    if actel(0)=='start': #the current become parent since this
                          #algorithm is depth first
      parent = actel(1).tag

# "leaves" includes only the first row of each table
leaves=()
for entry in all_leaves:
  if (entry not in leaves) and (entry(0)=='start'):
    leaves.append(entry)


# These are the names of the tables
nf_nodes=()
for entry in leaves(2:):#ignore the first two:
  nf_nodes.append(entry(1))
nf_nodes=list(set(nf_nodes))


# The dictionary "all_dic" has keys labeled as the node
# above each set of leaves (i.e. DataFrames) with value equal to the corresponding
# DataFrame or leav
all_dic={}
for l in nf_nodes:
    all_dic(l)=pdx.read_xml(url, ('DRFAssessments',l))

I have the following code below that reads and parses xml files. The following code takes about 10:30 seconds to run. I wanted to see if there were any suggestions or any edits that could be made to the code to increase the speed. The code is used to parse the xml file before it is uploaded to a MYSQL database. It also parses 20 tables if that is any useful

rust – List parsing for ‘cut’

I’m new to Rust and am learning by implementing my own version of cut. This is a snippet that parses the <list> of ranges required for the -f, -b, or -c options. The relevant section of the spec states:

The application shall ensure that the option-argument list (see options -b, -c, and -f below) is a -separated list or -separated list of positive numbers and ranges. Ranges can be in three forms. The first is two positive numbers separated by a (low- high), which represents all fields from the first number to the second number. The second is a positive number preceded by a (- high), which represents all fields from field number 1 to that number. The third is a positive number followed by a ( low-), which represents that number to the last field, inclusive. The elements in list can be repeated, can overlap, and can be specified in any order, but the bytes, characters, or fields selected shall be written in the order of the input data. If an element appears in the selection list more than once, it shall be written exactly once.

I’m interested in tips for writing more idiomatic Rust (especially the error handling), and any other hints for a new Rust programmer. Thanks!

use std::error::Error;
use std::fmt;
use std::fmt::{Display, Formatter};
use std::iter::FromIterator;
use std::num::ParseIntError;

pub type Result<T> = std::result::Result<T, RangeError>;

#(derive(Debug, Eq, PartialEq))
pub enum RangeError {
    MalformedRangSpec,
    Parse(ParseIntError),
}

impl Display for RangeError {
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        use RangeError::*;
        let msg = match self {
            MalformedRangSpec => format!("Invalid range spec"),
            Parse(e) => e.to_string(),
        };
        write!(f, "Error: {}", msg)
    }
}

impl From<ParseIntError> for RangeError {
    fn from(e: ParseIntError) -> Self {
        RangeError::Parse(e)
    }
}

impl Error for RangeError {}

#(derive(Debug, Eq, PartialEq, Hash))
pub enum Range {
    From(usize),
    To(usize),
    Inclusive(usize, usize),
    Singleton(usize),
}

#(derive(Debug, Eq, PartialEq))
pub struct RangeSet {
    ranges: Vec<Range>,
}

impl RangeSet {
    pub fn from<I: IntoIterator<Item = Range>>(iter: I) -> RangeSet {
        RangeSet {
            ranges: Vec::from_iter(iter),
        }
    }

    pub fn from_spec<T: AsRef<str>>(spec: T) -> Result<RangeSet> {
        // "-5,10,14-17,20-"
        let tuples = spec
            .as_ref()
            .split(|c| c == ',' || c == ' ') // e.g. ("-5", "10", "14-17", "20-")
            // ((None, Some(5)), (Some(10)), (Some(14), Some(17)), (Some(20), None))
            .map(|element| {
                element
                    .split('-') // e.g. first iter: ("", 5)
                    .map(|bound| match bound {
                        // e.g. (None, Some("5"))
                        "" => Ok(None),
                        s => {
                            let n: usize = s.parse()?;
                            Ok(Some(n))
                        }
                    })
                    .collect::<Result<Vec<_>>>()
            })
            .collect::<Result<Vec<_>>>()?;

        let ranges: Vec<Range> = tuples
            .iter()
            .map(|range| match range.as_slice() {
                (Some(n)) => Ok(Range::Singleton(*n)),
                (Some(s), Some(e)) => Ok(Range::Inclusive(*s, *e)),
                (Some(s), None) => Ok(Range::From(*s)),
                (None, Some(e)) => Ok(Range::To(*e)),
                _ => Err(RangeError::MalformedRangSpec),
            })
            .collect::<Result<Vec<_>>>()?;

        Ok(RangeSet::from(ranges))
    }

    pub fn contains(&self, n: usize) -> bool {
        if n == 0 {
            // range defined to start at 1
            return false;
        }

        self.ranges.iter().any(|range| match range {
            Range::From(from) => (*from..).contains(&n),
            Range::To(to) => (1..=*to).contains(&n),
            Range::Inclusive(from, to) => (*from..=*to).contains(&n),
            Range::Singleton(s) => s == &n,
        })
    }
}

#(cfg(test))
mod test {
    use super::*;

    #(test)
    fn contains() {
        let r = RangeSet::from(vec!(
            Range::From(100),
            Range::Inclusive(50, 60),
            Range::Singleton(40),
            Range::To(10),
        ));

        for n in 0..1000 {
            match n {
                1..=10 | 40..=40 | 50..=60 | 100..=1000 => {
                    assert!(r.contains(n), "should contain {}", n)
                }
                _ => assert!(!r.contains(n), "shouldn't contain {}", n),
            }
        }
    }

    #(test)
    fn from_spec() {
        let r1 = RangeSet::from(vec!(Range::Singleton(1)));
        let r2 = RangeSet::from_spec("1");
        assert_eq!(Ok(r1), r2);

        let r1 = RangeSet::from(vec!(
            Range::To(10),
            Range::Singleton(40),
            Range::Inclusive(50, 60),
            Range::From(100),
        ));

        let r2 = RangeSet::from_spec("-10,40,50-60,100-");

        assert_eq!(Ok(r1), r2);
    }

    #(test)
    fn from_spec_bad() {
        assert!(RangeSet::from_spec("b").is_err());
        assert!(RangeSet::from_spec("4-5-6").is_err());
    }
}