config_parser: Introduce stricter syntax conventions (#1377)
This is the next step to merge #1237 in stages.
Currently there are barely any restrictions on how the config can be
written. This causes things like config files with DOS line endings to
not be parsed properly (#1366) because polybar splits by `\n` and when
parsing section headers, it can't deal with the `\r` at the end of the
line and thus doesn't recognize any section headers.
With this PR we introduce some rules as to what characters are allowed
in section names and keys.
Note: When talking about spaces I refer to any character for which
`isspace()` returns `true`.
The rules are as follows:
* A section name or a key name cannot contain any spaces as well as any
of there characters:`"'=;#[](){}:.$\%`
* Spaces at the beginning and end of lines are always ignored when
parsing
* Comment lines start with `;` or `#` and last for the whole line. The
whole line will be ignored by the parser. You cannot start a comment at
the end of a line.
* Section headers have the following form `[HEADER_NAME]`
* Key-value lines look like this:
`KEY_NAME{SPACES}={SPACES}VALUE_STRING` where `{SPACES}` represents any
number of spaces. `VALUE_STRING` can contain any characters. If it is
*surrounded* with double quotes (`"`), those quotes will be removed,
this can be used to add spaces to the beginning or end of the value
* Empty lines are lines with only spaces in them
* If the line has any other form, it is a syntax error
This will introduce the following breaking changes because of how
underdefined the config syntax was before:
* `key = ""` will get treated as an empty string instead of the literal
* string `""`
* Any section or key name with forbidden characters will now be syntax
errors.
* Certain strings will be forbidden as section names: `self`, `root`,
* `BAR`. Because they have a special meaning inside references and so a
* section `[root]` can never be referenced.
This replaces the current parser implementation with a new more robust
one that will later be expanded to also check for dependency cycles and
allow for values that contain references mixed with other strings.
This PR also now expands the config paths given over the command line so
that `--config=~/.config/polybar/config` resolves properly.
Closes #1032
Closes #1694
* config_parser: Add skeleton with tests
First step in the config_parser develoment. Only tests functions that
are easily testable without many outside dependencies. Integration tests
will follow.
* config_parser: Implement parse_header
* config_parser: Implement get_line_type
* feat(string): Add trim functions with predicate
Not only trimming based on single character matching but based on a
freely specifiable predicate. Will be used to trim all spaces (based on
isspace)
* config_parser: Implement parse_key
* config_parser: Implement parse_line for valid lines
* config_parser: Throw exception on invalid lines
* config_parser: Remove line_no and file_index from parse_line
Cleaner to let the caller catch and fill in the line number and file
path
* string: Clear up misleading description of trim
Before, trim would remove all characters that *didn't* match the
predicate and thus the predicate isspace wouldn't work correctly. But
because we used the inverse (isnospace_pred) it all worked out, but if
the function was used with any other function, it wouldn't have given
the desired output
* config_parser: Implement parse_file
* config_parser: Switch operation to config_parser
This changes the way the config is invoked. Now main.cpp creates a
config_parser object which then returns the singleton config object from
the parse method. Subsequent calls to config::make will return the
already created config object as before
The config_parser does not yet have all the functionality of the old
parser: `inherit` directives are not yet resolved. Other than that all
the old functionality is implemented (creating sectionmap and applying
include-file)
Any sort of dependency detection (except for include-file) are still
missing
* config: Move xrm initialization to constructor
config_parser handles the detection of xrdb references and passes that
info to the config object.
This finally allows us to delete the config::parse_file function because
everything in it has been implemented (except for xrdb detection and
file error handling)
* refactor(config_parser): Cleanup
* config_parser: Set config data after initialization
Looks much cleaner this way
* config_parser: Expand include-file paths
* config_parser: Init xrm if the config uses %{xrdb references
* config_parser: Use same type of maps as in old impl
Polybar has some weird, not yet fixed, inheriting behaviour and it
changes depending on the order in which the config stores its data.
Using the same type of maps ensures that the behaviour stays the same.
* refactor(config_parser): Clearer invalid name error message
* config_parser: Don't allow reserved section names
Sections with the names 'self', 'BAR', 'root' could never be referenced
because those strings have a special meaning inside references
* config_parser: Handle inherit directives
This uses the old copy_inherited function, so this still suffers from
crashes if there are cyclic dependencies.
This also fixes the behaviour where any key that starts with 'inherit'
would be treated as an inherit directive
* config_parser: Clearer dependency cycle error message
* refactor(config_parser): Handle file errors when parsing
This removes the need to check if the file exists separately
* fix(config): expand config file path
Now paths using ~ and environment variables can be used as the config
path
* fix(config): Properly recognize xrdb references
* config_parser: Make messages more informative
* doc(config): Improve commenting
Comments now describe what the config_parser actually does instead of
what it will do.
We also now follow the rule that single line comments inside functions
should use `//` comments
* refactor: Move else on same line as curly braces
* fix(config_parser): Don't duplicate paths in `files`
* refactor(config_parser): Use else if for clarity
* fix(config): Undefined behavior in syntax_error
Before the custom what() method produced undefined behavior because the
returned string became invalid once the function returned.
* refactor(config): descriptive name for useless lines
is_valid could easily be confused as meaning syntactically invalid
without it being clarified in a comment
* refactor(config): Use separate strings instead of key_value
Takes just as much space and is much better to read
* fix(config_parser): TestCase -> TestSuite and fix macro call
Ref: #1644
* config_parser: use const string& in method args
* config_parser: Improve comments
* config_parser: Incorporate review comments
2019-08-06 17:41:31 +00:00
|
|
|
#pragma once
|
|
|
|
|
|
|
|
#include <set>
|
|
|
|
|
|
|
|
#include "common.hpp"
|
|
|
|
#include "components/config.hpp"
|
|
|
|
#include "components/logger.hpp"
|
|
|
|
#include "errors.hpp"
|
|
|
|
|
|
|
|
POLYBAR_NS
|
|
|
|
|
|
|
|
DEFINE_ERROR(parser_error);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Exception object for syntax errors
|
|
|
|
*
|
|
|
|
* Contains filepath and line number where syntax error was found
|
|
|
|
*/
|
|
|
|
class syntax_error : public parser_error {
|
|
|
|
public:
|
|
|
|
/**
|
|
|
|
* Default values are used when the thrower doesn't know the position.
|
|
|
|
* parse_line has to catch, set the proper values and rethrow
|
|
|
|
*/
|
|
|
|
explicit syntax_error(string msg, const string& file = "", int line_no = -1)
|
|
|
|
: parser_error(file + ":" + to_string(line_no) + ": " + msg), msg(move(msg)) {}
|
|
|
|
|
|
|
|
const string& get_msg() {
|
|
|
|
return msg;
|
|
|
|
};
|
|
|
|
|
|
|
|
private:
|
|
|
|
string msg;
|
|
|
|
};
|
|
|
|
|
|
|
|
class invalid_name_error : public syntax_error {
|
|
|
|
public:
|
|
|
|
/**
|
|
|
|
* type is either Header or Key
|
|
|
|
*/
|
|
|
|
invalid_name_error(const string& type, const string& name)
|
2019-10-03 20:41:37 +00:00
|
|
|
: syntax_error(type + " name '" + name + "' is empty or contains forbidden characters.") {}
|
config_parser: Introduce stricter syntax conventions (#1377)
This is the next step to merge #1237 in stages.
Currently there are barely any restrictions on how the config can be
written. This causes things like config files with DOS line endings to
not be parsed properly (#1366) because polybar splits by `\n` and when
parsing section headers, it can't deal with the `\r` at the end of the
line and thus doesn't recognize any section headers.
With this PR we introduce some rules as to what characters are allowed
in section names and keys.
Note: When talking about spaces I refer to any character for which
`isspace()` returns `true`.
The rules are as follows:
* A section name or a key name cannot contain any spaces as well as any
of there characters:`"'=;#[](){}:.$\%`
* Spaces at the beginning and end of lines are always ignored when
parsing
* Comment lines start with `;` or `#` and last for the whole line. The
whole line will be ignored by the parser. You cannot start a comment at
the end of a line.
* Section headers have the following form `[HEADER_NAME]`
* Key-value lines look like this:
`KEY_NAME{SPACES}={SPACES}VALUE_STRING` where `{SPACES}` represents any
number of spaces. `VALUE_STRING` can contain any characters. If it is
*surrounded* with double quotes (`"`), those quotes will be removed,
this can be used to add spaces to the beginning or end of the value
* Empty lines are lines with only spaces in them
* If the line has any other form, it is a syntax error
This will introduce the following breaking changes because of how
underdefined the config syntax was before:
* `key = ""` will get treated as an empty string instead of the literal
* string `""`
* Any section or key name with forbidden characters will now be syntax
errors.
* Certain strings will be forbidden as section names: `self`, `root`,
* `BAR`. Because they have a special meaning inside references and so a
* section `[root]` can never be referenced.
This replaces the current parser implementation with a new more robust
one that will later be expanded to also check for dependency cycles and
allow for values that contain references mixed with other strings.
This PR also now expands the config paths given over the command line so
that `--config=~/.config/polybar/config` resolves properly.
Closes #1032
Closes #1694
* config_parser: Add skeleton with tests
First step in the config_parser develoment. Only tests functions that
are easily testable without many outside dependencies. Integration tests
will follow.
* config_parser: Implement parse_header
* config_parser: Implement get_line_type
* feat(string): Add trim functions with predicate
Not only trimming based on single character matching but based on a
freely specifiable predicate. Will be used to trim all spaces (based on
isspace)
* config_parser: Implement parse_key
* config_parser: Implement parse_line for valid lines
* config_parser: Throw exception on invalid lines
* config_parser: Remove line_no and file_index from parse_line
Cleaner to let the caller catch and fill in the line number and file
path
* string: Clear up misleading description of trim
Before, trim would remove all characters that *didn't* match the
predicate and thus the predicate isspace wouldn't work correctly. But
because we used the inverse (isnospace_pred) it all worked out, but if
the function was used with any other function, it wouldn't have given
the desired output
* config_parser: Implement parse_file
* config_parser: Switch operation to config_parser
This changes the way the config is invoked. Now main.cpp creates a
config_parser object which then returns the singleton config object from
the parse method. Subsequent calls to config::make will return the
already created config object as before
The config_parser does not yet have all the functionality of the old
parser: `inherit` directives are not yet resolved. Other than that all
the old functionality is implemented (creating sectionmap and applying
include-file)
Any sort of dependency detection (except for include-file) are still
missing
* config: Move xrm initialization to constructor
config_parser handles the detection of xrdb references and passes that
info to the config object.
This finally allows us to delete the config::parse_file function because
everything in it has been implemented (except for xrdb detection and
file error handling)
* refactor(config_parser): Cleanup
* config_parser: Set config data after initialization
Looks much cleaner this way
* config_parser: Expand include-file paths
* config_parser: Init xrm if the config uses %{xrdb references
* config_parser: Use same type of maps as in old impl
Polybar has some weird, not yet fixed, inheriting behaviour and it
changes depending on the order in which the config stores its data.
Using the same type of maps ensures that the behaviour stays the same.
* refactor(config_parser): Clearer invalid name error message
* config_parser: Don't allow reserved section names
Sections with the names 'self', 'BAR', 'root' could never be referenced
because those strings have a special meaning inside references
* config_parser: Handle inherit directives
This uses the old copy_inherited function, so this still suffers from
crashes if there are cyclic dependencies.
This also fixes the behaviour where any key that starts with 'inherit'
would be treated as an inherit directive
* config_parser: Clearer dependency cycle error message
* refactor(config_parser): Handle file errors when parsing
This removes the need to check if the file exists separately
* fix(config): expand config file path
Now paths using ~ and environment variables can be used as the config
path
* fix(config): Properly recognize xrdb references
* config_parser: Make messages more informative
* doc(config): Improve commenting
Comments now describe what the config_parser actually does instead of
what it will do.
We also now follow the rule that single line comments inside functions
should use `//` comments
* refactor: Move else on same line as curly braces
* fix(config_parser): Don't duplicate paths in `files`
* refactor(config_parser): Use else if for clarity
* fix(config): Undefined behavior in syntax_error
Before the custom what() method produced undefined behavior because the
returned string became invalid once the function returned.
* refactor(config): descriptive name for useless lines
is_valid could easily be confused as meaning syntactically invalid
without it being clarified in a comment
* refactor(config): Use separate strings instead of key_value
Takes just as much space and is much better to read
* fix(config_parser): TestCase -> TestSuite and fix macro call
Ref: #1644
* config_parser: use const string& in method args
* config_parser: Improve comments
* config_parser: Incorporate review comments
2019-08-06 17:41:31 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief All different types a line in a config can be
|
|
|
|
*/
|
|
|
|
enum class line_type { KEY, HEADER, COMMENT, EMPTY, UNKNOWN };
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Storage for a single config line
|
|
|
|
*
|
|
|
|
* More sanitized than the actual string of the comment line, with information
|
|
|
|
* about line type and structure
|
|
|
|
*/
|
|
|
|
struct line_t {
|
|
|
|
/**
|
|
|
|
* Whether or not this struct represents a "useful" line, a line that has
|
|
|
|
* any semantic significance (key-value or header line)
|
|
|
|
* If false all other fields are not set.
|
|
|
|
* Set this to false, if you want to return a line that has no effect
|
|
|
|
* (for example when you parse a comment line)
|
|
|
|
*/
|
|
|
|
bool useful;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Index of the config_parser::files vector where this line is from
|
|
|
|
*/
|
|
|
|
int file_index;
|
|
|
|
int line_no;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* We access header, if is_header == true otherwise we access key, value
|
|
|
|
*/
|
|
|
|
bool is_header;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Only set for header lines
|
|
|
|
*/
|
|
|
|
string header;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Only set for key-value lines
|
|
|
|
*/
|
|
|
|
string key, value;
|
|
|
|
};
|
|
|
|
|
|
|
|
class config_parser {
|
|
|
|
public:
|
|
|
|
config_parser(const logger& logger, string&& file, string&& bar);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Performs the parsing of the main config file m_file
|
|
|
|
*
|
|
|
|
* \returns config class instance populated with the parsed config
|
|
|
|
*
|
|
|
|
* \throws syntax_error If there was any kind of syntax error
|
|
|
|
* \throws parser_error If aynthing else went wrong
|
|
|
|
*/
|
|
|
|
config::make_type parse();
|
|
|
|
|
|
|
|
protected:
|
|
|
|
/**
|
|
|
|
* \brief Converts the `lines` vector to a proper sectionmap
|
|
|
|
*/
|
|
|
|
sectionmap_t create_sectionmap();
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Parses the given file, extracts key-value pairs and section
|
|
|
|
* headers and adds them onto the `lines` vector
|
|
|
|
*
|
|
|
|
* This method directly resolves `include-file` directives and checks for
|
|
|
|
* cyclic dependencies
|
|
|
|
*
|
|
|
|
* `file` is expected to be an already resolved absolute path
|
|
|
|
*/
|
|
|
|
void parse_file(const string& file, file_list path);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Parses the given line string to create a line_t struct
|
|
|
|
*
|
|
|
|
* We use the INI file syntax (https://en.wikipedia.org/wiki/INI_file)
|
|
|
|
* Whitespaces (tested with isspace()) at the beginning and end of a line are ignored
|
|
|
|
* Keys and section names can contain any character except for the following:
|
|
|
|
* - spaces
|
|
|
|
* - equal sign (=)
|
|
|
|
* - semicolon (;)
|
|
|
|
* - pound sign (#)
|
|
|
|
* - Any kind of parentheses ([](){})
|
|
|
|
* - colon (:)
|
|
|
|
* - period (.)
|
|
|
|
* - dollar sign ($)
|
|
|
|
* - backslash (\)
|
|
|
|
* - percent sign (%)
|
|
|
|
* - single and double quotes ('")
|
|
|
|
* So basically any character that has any kind of special meaning is prohibited.
|
|
|
|
*
|
|
|
|
* Comment lines have to start with a semicolon (;) or a pound sign (#),
|
|
|
|
* you cannot put a comment after another type of line.
|
|
|
|
*
|
|
|
|
* key and section names are case-sensitive.
|
|
|
|
*
|
|
|
|
* Keys are specified as `key = value`, spaces around the equal sign, as
|
|
|
|
* well as double quotes around the value are ignored
|
|
|
|
*
|
|
|
|
* sections are defined as [section], everything inside the square brackets is part of the name
|
|
|
|
*
|
|
|
|
* \throws syntax_error if the line isn't well formed. The syntax error
|
|
|
|
* does not contain the filename or line numbers because parse_line
|
|
|
|
* doesn't know about those. Whoever calls parse_line needs to
|
|
|
|
* catch those exceptions and set the file path and line number
|
|
|
|
*/
|
|
|
|
line_t parse_line(const string& line);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Determines the type of a line read from a config file
|
|
|
|
*
|
|
|
|
* Expects that line is trimmed
|
|
|
|
* This mainly looks at the first character and doesn't check if the line is
|
|
|
|
* actually syntactically correct.
|
|
|
|
* HEADER ('['), COMMENT (';' or '#') and EMPTY (None) are uniquely
|
|
|
|
* identified by their first character (or lack thereof). Any line that
|
|
|
|
* is none of the above and contains an equal sign, is treated as KEY.
|
|
|
|
* All others are UNKNOWN
|
|
|
|
*/
|
|
|
|
static line_type get_line_type(const string& line);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Parse a line containing a section header and returns the header name
|
|
|
|
*
|
|
|
|
* Only assumes that the line starts with '[' and is trimmed
|
|
|
|
*
|
|
|
|
* \throws syntax_error if the line doesn't end with ']' or the header name
|
|
|
|
* contains forbidden characters
|
|
|
|
*/
|
|
|
|
string parse_header(const string& line);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Parses a line containing a key-value pair and returns the key name
|
|
|
|
* and the value string inside an std::pair
|
|
|
|
*
|
|
|
|
* Only assumes that the line contains '=' at least once and is trimmed
|
|
|
|
*
|
|
|
|
* \throws syntax_error if the key contains forbidden characters
|
|
|
|
*/
|
|
|
|
std::pair<string, string> parse_key(const string& line);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Name of all the files the config includes values from
|
|
|
|
*
|
|
|
|
* The line_t struct uses indices to this vector to map lines to their
|
|
|
|
* original files. This allows us to point the user to the exact location
|
|
|
|
* of errors
|
|
|
|
*/
|
|
|
|
file_list m_files;
|
|
|
|
|
|
|
|
private:
|
|
|
|
/**
|
|
|
|
* \brief Checks if the given name doesn't contain any spaces or characters
|
|
|
|
* in config_parser::m_forbidden_chars
|
|
|
|
*/
|
|
|
|
bool is_valid_name(const string& name);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Whether or not an xresource manager should be used
|
|
|
|
*
|
|
|
|
* Is set to true if any ${xrdb...} references are found
|
|
|
|
*/
|
|
|
|
bool use_xrm{false};
|
|
|
|
|
|
|
|
const logger& m_log;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief Absolute path to the main config file
|
|
|
|
*/
|
|
|
|
string m_config;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Is used to resolve ${root...} references
|
|
|
|
*/
|
|
|
|
string m_barname;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief List of all the lines in the config (with included files)
|
|
|
|
*
|
|
|
|
* The order here matters, as we have not yet associated key-value pairs
|
|
|
|
* with sections
|
|
|
|
*/
|
|
|
|
vector<line_t> m_lines;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief None of these characters can be used in the key and section names
|
|
|
|
*/
|
|
|
|
const string m_forbidden_chars{"\"'=;#[](){}:.$\\%"};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* \brief List of names that cannot be used as section names
|
|
|
|
*
|
|
|
|
* These strings have a special meaning inside references and so the
|
|
|
|
* section [self] could never be referenced.
|
|
|
|
*
|
|
|
|
* Note: BAR is deprecated
|
|
|
|
*/
|
|
|
|
const std::set<string> m_reserved_section_names = {"self", "BAR", "root"};
|
|
|
|
};
|
|
|
|
|
|
|
|
POLYBAR_NS_END
|