ECN>> pst>> 返回
项目作者: peterkuma

项目描述 :
Plain Structured Text – data format suitable for I/O & command line
高级语言: Python
项目地址: git://github.com/peterkuma/pst.git
创建时间: 2019-02-04T23:25:54Z
项目社区:https://github.com/peterkuma/pst

开源协议:The Unlicense

下载


Plain Structured Text (PST)

PST is a format for encoding structured text similar to Bourne shell formatting
and JSON. PST supports strings, numbers (integers and floating-point), bool,
missing values (none), arrays, objects (key-value pairs),
single-character flags (-x), and string flags (--abc).
Relative to JSON, PST is simpler, while supporting much of its features.
PST aims to be human and machine readable, and suitable for command-line
argument formatting, standard input/output and configuration file
formatting. PST is similar to YAML, but supporting one-line expressions
(indentation does not matter).

Implementations of PST as a command-line program and a Python 3 function
are available.

Complex example

This example is adapted from Wikipedia and is licensed under the
CC BY-SA 3.0
license.

PST:

  1. firstName: John
  2. lastName: Smith
  3. isAlive: true
  4. age: 27
  5. address: {{
  6. streetAddress: "21 2nd Street"
  7. city: "New York"
  8. state: NY
  9. postalCode: 10021-3100
  10. }}
  11. phoneNumbers: {
  12. {{ type: home number: "212 555-1234" }}
  13. {{ type: office number: "646 555-4567" }}
  14. {{ type: mobile number: "123 456-7890" }}
  15. }
  16. children: { }
  17. spouse: none

JSON:

  1. {
  2. "firstName": "John",
  3. "lastName": "Smith",
  4. "isAlive": true,
  5. "age": 27,
  6. "address": {
  7. "streetAddress": "21 2nd Street",
  8. "city": "New York",
  9. "state": "NY",
  10. "postalCode": "10021-3100"
  11. },
  12. "phoneNumbers": [
  13. {
  14. "type": "home",
  15. "number": "212 555-1234"
  16. },
  17. {
  18. "type": "office",
  19. "number": "646 555-4567"
  20. },
  21. {
  22. "type": "mobile",
  23. "number": "123 456-7890"
  24. }
  25. ],
  26. "children": [],
  27. "spouse": null
  28. }

The same PST could be supplied as command-line arguments (albeit very long):

  1. pst firstName: John lastName: Smith isAlive: true age: 27 address: {{ \
  2. streetAddress: "21 2nd Street" city: "New York" state: NY postalCode: \
  3. 10021-3100 }} phoneNumbers: { {{ type: home number: "212 555-1234" }} \
  4. {{ type: office number: "646 555-4567" }} {{ type: mobile number: \
  5. "123 456-7890" }} } children: { } spouse: none

would output the following JSON:

  1. {"children": [], "phoneNumbers": [{"number": "212 555-1234", "type": "home"}, {"number": "646 555-4567", "type": "office"}, {"number": "123 456-7890", "type": "mobile"}], "firstName": "John", "isAlive": true, "spouse": null, "age": 27, "lastName": "Smith", "address": {"state": "NY", "streetAddress": "21 2nd Street", "city": "New York", "postalCode": "10021-3100"}}

Informal description

PST is composed of a sequence of words, which encode elementary types
such as strings, integers, floating-point numbers or arbitrarily nested complex
types such as arrays (list) and objects (dict).

Strings do not need to be
quoted unless they contain white space, special characters which could be
interpreted as a number or bracket. Words composed of digits are implicitly
converted to numbers unless quoted.

Curly brackets enclose arrays. Double curly brackets enclose explicit objects.
Objects are composed of key-value pairs, which can be located inline
(implicit objects) or inside double curly brackets (explicit objects).
Unlike implicit objects, it is possible to use explicit objects as the value of
a key-value pair. Flags beginning with a dash and double dash are converted
to key-value pairs.

Any amount of white space or indentation is equivalent to a single space.
Separation between words, brackets, special characters such as : in the key of
a key-value pair matters.

8-bit ASCII-compatible character encoding is assumed. Strings can contain any
binary data by using escape characters. Conversion from UTF-8 character
encoding to Unicode is supported by the Python PST API.

PST is designed to be compatible with JSON, while also being suitable for
command-line argument passing. For example, special characters which would
clash with other uses are not used: (, ) have special interpretation
in Bash, [, ] are commonly used in documentation of command-line programs
to denote optional arguments. Implicit objects make it easy to denote named
command-line arguments. Flags ensure established syntax can be used to express
command-line arguments. Arrays and objects enable complex
command-line arguments. No need for quoting common strings and no commas make
it easier to write PST than JSON.

Examples

  1. # Empty
  2. PST:
  3. JSON: null
  4. # Single string
  5. PST: a
  6. JSON: ["a"]
  7. # Quoted string
  8. PST: "a b"
  9. JSON: "a b"
  10. # Partially-quoted string
  11. PST: a"b c"
  12. JSON: "ab c"
  13. # Two strings
  14. PST: a b
  15. JSON: ["a", "b"]
  16. # Two strings separated by a newline
  17. PST:
  18. a
  19. b
  20. JSON: ["a", "b"]
  21. # Key-value pair
  22. PST: a: 1
  23. JSON: {"a": 1}
  24. # Sequence of key-value pairs
  25. PST: a: 1 b: 2
  26. JSON: {"a": 1, "b": 2}
  27. # Sequence of key-value pairs and a string
  28. PST: a: 1 b: 2 c
  29. JSON: [{"a": 1, "b": 2}, "c"]
  30. # Empty array
  31. PST: { }
  32. JSON: []
  33. # String and an empty array
  34. PST: a { }
  35. JSON: ["a", []]
  36. # Empty object
  37. PST: {{ }}
  38. JSON: {}
  39. # String and an empty object
  40. PST: a {{ }}
  41. JSON: ["a", {}]
  42. # String and an array
  43. PST: a { b c }
  44. JSON: ["a", ["b", "c"]]
  45. # An array as value followed by a string
  46. PST: a: { b c } d
  47. JSON: [{"a": ["b", "c"]}, "d"]
  48. # Literals
  49. PST: true false none
  50. JSON: [true, false, null]
  51. # Single-character flags
  52. PST: -ab
  53. JSON {"a": true, "b": true}
  54. # String flag
  55. PST: --ab
  56. JSON: {"ab": true}

Usage

Command line interface

pst

  1. pst <pst>...

Convert PST-formatted arguments to JSON. Prints JSON to the standard output.

pstf

  1. pstf < input.pst

Convert PST-formatted standard input to JSON. Prints JSON to the standard
output.

Python interface

  1. import pst

decode

  1. pst.decode(s, as_unicode=False)

Decode PST. s is PST (binary string) or a list of PST. If as_unicode (bool)
is True, convert strings to Unicode on output by assuming the UTF-8 encoding.
Invalid UTF-8 bytes are encoded using the “surrogateescape” encoding in the
U+DCxx Unicode range.

decode_argv

  1. pst.decode_argv(argv, delim=True, **kwargs)

Decode PST and split the resulting list into positional and named arguments.
argv is a list such as sys.argv and kwargs are keyword arguments passed
to pst.decode. Returns a tuple (args, opts), where args are positional
arguments and opts are named arguments. If delim is True, interpret a
standalone double-dash argument (--) in argv as an end of options delimiter,
after which all arguments are treated as literal string arguments.

encode

  1. pst.encode(x, encoder=None, indent=False, indent_len='tab', flags=False, short_flags=False, long_flags=False, escape=False)

Encode Python structure x consisting of list, tuple, dict, byte, str, int and
float as PST (either as scalars or nested). Returns bytes. encoder is a
user-defined function to transform individual elements of the structure to one
of the above types before they are read by the encoder. If indent is true,
output indentation is applied. indent_len is the number of space characters
used for indentation or tab for indentation with the tab character. If
flags is true, key-value pairs with a value of true are encoded as flags. If
short_flags is true, key-value pairs with a value of true and
single-character key are encoded as single-character flags. If long_flags is
true, key-value pairs with a value of true and multiple-character key are
encoded as string flags. If escape is true, non-printable ASCII characters in
strings are encoded as escape sequences.

Installation

Linux

  1. Install the required system packages. On Debian-derived distributions
    (Ubuntu, Devuan, …):

    1. apt install python3-full python3-pip pipx

    On Fedora:

    1. sudo yum install python3 python3-pip pipx
  2. Install PST. If you indend to only use the command-line interface, you can
    install PST with pipx:

    1. pipx install pst-format

    You might have to add $HOME/.local/bin to the PATH environment variable
    if not present already in order to access the pst and pstf commands. This
    can be done with pipx ensurepath.

    If you indend to use the Python interface, you can install in the home
    directory with pip3:

    1. pip3 install pst-format

    Replace pip3 with pip if pip3 is not available. Add --break-system-packages
    if your distribution does not allow installing into the home directory but
    you want to anyway.

    Alternatively, install into a Python virtual environment with:

    1. python3 -m venv venv
    2. . venv/bin/activate
    3. pip3 install pst-format

    You can then use the PST Python interface from within the virtual
    environment. Deactivate the environment with deactivate.

You should now be able to run the commands pst and pstf.

Windows

  1. Install Python. In the installer, tick Add python.exe to PATH.

  2. Open the Command Prompt from the Start menu. Install PST with:

    1. pip3 install pst-format

You should now be able to run the commands pst and pstf.

macOS

Important: On macOS the pst command should be used with the command line
shell bash, not the default zsh, which is not compatible with the argument
syntax.

Open the Terminal. Install PST with:

  1. python3 -m pip install pst-format

Make sure that /Users/<user>/Library/Python/<version>/bin is included in the
PATH environment variable if not already, where <user> is your system
user name and <version> is the Python version. This path should be printed
by the above command. This can be done by adding this line to the file
.zprofile in your home directory and restart the Terminal:

  1. PATH="$PATH:/Users/<user>/Library/Python/<version>/bin"

You should now be able to run the commands pst and pstf.

Uninstallation

To uninstall if installed with pipx:

  1. pipx uninstall pst-format

To uninstall if installed with pip3 or pip:

  1. pip3 uninstall pst-format

Replace pip3 with pip if pip3 is not available.

Shell compatibility

  1. mkdir example
  2. cd example
  3. mkdir a b
  4. pst *
  5. ["a", "b"]
  6. touch a/1 a/2 b/3 b/4
  7. pst a: { a/* } b: { b/* }
  8. [{"a": ["a/1", "a/2"], "b": ["b/3", "b/4"]}]
  9. # Better
  10. pst a: { $(ls a/* --quoting-style c) } b: { $(ls b/* --quoting-style c) }
  11. [{"a": ["1", "2"], "b": ["3", "4"]}]

Syntax

PST

PST is a sequence of words separated by white space, encoded in 8-bit ASCII.

White space characters

White space characters are space (`), form-feed (\f), newline (\n), carriage return (\r), horizontal tab (\t), and vertical tab (\v`).

White space

White space is a sequence of white space characters.

Word

A word is a sequence of non-white space characters, and white space
characters if they are inside a quoted part. A quoted part of a word is a part
of a word enclosed in double quotes ("). A character inside a word preceded by
backslash (\) is escaped, and is treated literally (loses its special meaning),
unless it is one of the ANSI C quotes, in which case it is translated to the
corresponding 8-bit ASCII character:

  • \a: alert/bell (7)
  • \b: backspace (8)
  • \e: escape (27)
  • \f: form feed (12)
  • \n: newline (10)
  • \r: carriage return (13)
  • \t: horizontal tab (9)
  • \v: vertical tab (11)
  • \nnn: octal value nnn, one to three digits

Literal

Non-quoted words true, false, none are literals, and are interpreted as
true, false, null (respectively).

Integer

An integer is a word composed of non-quoted digits.

Floating-point number

A floating-point number is a words composed of non-quoted digits and a
non-quoted dot (.), beginning with a digit.

Number

A number is an integer or a floating-point number.

Bracket

A bracket is a word which is a non-quoted opening or closing curly
bracket ({, }).

Double bracket

A double bracket is a word which is a non-quoted opening or closing double
curly bracket ({{, }}).

Key

A word ending with a non-quoted colon (:) is a key.

String

A string is a word which is not a key, literal, number, bracket,
double bracket, single-character flag or a string flag.

Array

An array is a PST enclosed in square brackets.

Value

A value is a string, literal, number, or array following a key.

Key-value pair

A key followed by a value is a key-value pair.

Implicit object

An implicit object is a sequence of one or more key-value pairs not enclosed
in brackets. An implicit object cannot be the value in a key-value pair.

Explicit object

An explicit object is a sequence of zero or more key-value pairs enclosed
in double brackets. An explicit object can be the value in a key-value pair.
Words inside the brackets which are not key-value pairs are ignored.

Single-character flag

Single-character flags are characters in a word beginning with an non-quoted
dash (-). Single-character flag is interpreted as an implicit object
{c: True}, where c is the character.

String flag

A string flag is a string in a word beginning with an non-quoted double-dash
(--). String flag is interpreted as an implicit object {s: True}, where s
is the string.

Changelog

2.1.0 (2023-09-07)

  • Improved installation.

2.0.0 (2022-11-21)

  • Added a double-dash (--) delimiter option to decode_argv and this is now the default (potentially breaks compatibility).
  • Removed obsolete Python 2.7 code.

1.2.1 (2022-10-12)

  • Fixed Unicode encoding.
  • Fixed indentation of empty objects.
  • Fixed application of encoder.
  • Dropped support for Python 2.7.

1.2.0 (2022-07-30)

  • Added encode function.
  • Fixed parsing of empty strings.
  • Fixed closing of implicit object inside list.
  • Improved documentation.
  • Dropped support for Python 2.

1.1.1 (2019-10-28)

  • Added pstf.

1.0.0 (2019-10-28)

  • Support for explicit objects.

0.1.0 (2019-02-05)

Initial release.

License

Public domain. See LICENSE.md.