Skip to content
Snippets Groups Projects
  1. Feb 27, 2019
    • Arseny Kapoulkine's avatar
      XPath: Make remove_duplicates generate stable order · c55ea3bc
      Arseny Kapoulkine authored
      Given an unsorted sequence, remove_duplicates would sort it using the
      pointer value of attributes/nodes and then remove consecutive
      duplicates.
      
      This was problematic because it meant that the result of XPath queries
      was dependent on the memory allocation pattern. While it's technically
      incorrect to rely on the order, this results in easy to miss bugs.
      
      This is particularly common when XPath queries use union operators -
      although we also will call remove_duplicates in other cases.
      
      This change reworks the code to use a hash set instead, using the same
      hash function we use for compact storage. To make sure it performs well,
      we allocate enough buckets for count * 1.5 (assuming all elements are
      unique); since each bucket is a single pointer unlike xpath_node which
      is two pointers, we need somewhere between size * 0.75 and size * 1.5
      temporary storage.
      
      The resulting filtering is stable - we remove elements that we have seen
      before but we don't change the order - and is actually significantly
      faster than sorting was.
      
      With a large union operation, before this change it took ~56 ms per 100
      query invocations to remove duplicates, and after this change it takes
      ~20ms.
      
      Fixes #254.
      c55ea3bc
    • Arseny Kapoulkine's avatar
      XPath: Create set for a|b in order before duplicate filtering · 93c7bacb
      Arseny Kapoulkine authored
      This does not change the result of a union operation [substantially], but
      it means that we now give a list to remove_duplicates that has more natural
      ordering.
      
      If remove_duplicates didn't sort the array, we'd have union operations
      resulting in a consistent predictable order.
      
      Contributes to #254.
      93c7bacb
  2. Jan 01, 2019
  3. Nov 24, 2018
    • Arseny Kapoulkine's avatar
      Fix Wdouble-promotion warnings · f9a2a7d1
      Arseny Kapoulkine authored
      We had a few places in test code and library source where we used an
      implicit float->double cast; while it should preserve the value exactly,
      gcc/clang implement this warning to make sure uses of double are intentional.
      
      This change also adds the warning to Makefile to make sure we don't
      regress on this warning.
      
      Fixes #243.
      f9a2a7d1
  4. Nov 20, 2018
    • Arseny Kapoulkine's avatar
      Escape TAB character in attribute values with 	 · aac75cd2
      Arseny Kapoulkine authored
      This change modifies the table entries for ctx_special_attr to treat TAB
      character as special, which makes the output code escape it.
      
      Before this change, trying to use TAB in an attribute value would output
      it verbatim; during subsequent parsing, pugixml - and other compliant
      parsers - would apply attribute-value normalization, turning the TAB
      into a space and losing the original value.
      
      Using 	 fixes this; if an input document has 	 in an attribute
      value, that gets unescaped into \t during parsing and escaped back into
      	 during output, which means we can now roundtrip values like this.
      
      Fixes #242.
      aac75cd2
  5. Nov 16, 2018
  6. Nov 12, 2018
  7. Oct 24, 2018
    • Arseny Kapoulkine's avatar
      XPath: Workaround Coverity false positive · d9fadc74
      Arseny Kapoulkine authored
      Coverity hits a similar false positive to what clang static analyzer hit
      - it assumes that since optimize() checks _right for being nullptr,
      optimize_self() might hit _right=nullptr in the ast_op_equal case which
      is impossible.
      
      Contributes to #236.
      d9fadc74
  8. Oct 16, 2018
    • Lipsa, Dan's avatar
      Remove warning in Visual Studio (#235) · 273fa0ab
      Lipsa, Dan authored
      The following warning is removed:
      Visual Studio 14.0
      1. warning C4275: non dll-interface class 'std::exception' used as
         base for dll-interface class 'vtkpugixml::xpath_exception'
      273fa0ab
  9. Sep 25, 2018
    • Arseny Kapoulkine's avatar
      Work around clang --analyze warnings · 81c82588
      Arseny Kapoulkine authored
      clang doesn't understand the invariants guaranteed for specific AST node
      types and, when seeing null pointer checks in optimize(), assumes any
      pointers in the node might be null. Work around this by adding explicit
      - redundant - null pointer checks.
      81c82588
    • Arseny Kapoulkine's avatar
      XPath: Refactor xpath_node_set short buffer optimization · e3b5e9ce
      Arseny Kapoulkine authored
      This change replaces xpath_node_set single element storage with a
      single-element array in hopes that this would silence Coverity false
      positive about getting a singleton pointer.
      
      Additionally, it refactors _assign member to unify small and large
      buffer codepaths since they are basically identical.
      
      Fixes #233 (hopefully)
      e3b5e9ce
  10. Jul 28, 2018
  11. Apr 15, 2018
  12. Apr 03, 2018
    • Arseny Kapoulkine's avatar
      Update version to 1.9 · 0c74e117
      Arseny Kapoulkine authored
      0c74e117
    • Arseny Kapoulkine's avatar
      Work around gcc-8 warning · 4f9af798
      Arseny Kapoulkine authored
      gcc-8 produces "attribute directive ignored" warning for
      no_sanitize("unsigned-integer-overflow"); at some point gcc will
      introduce integer sanitizer support and we'll have to do this all over
      again but for now just don't emit the attribute.
      4f9af798
  13. Mar 29, 2018
  14. Mar 17, 2018
    • Arseny Kapoulkine's avatar
      ubsan: Fix undefined behavior for signed left shift in compact mode · e50672cf
      Arseny Kapoulkine authored
      We were using << compact_alignment_log2 instead of * compact_alignment
      for symmetry with the encoding where >> is crucial to keep code fast and
      round to negative infinity.
      
      For decoding, the results are the same and any reasonable compiler
      should convert *4 into <<2 so just use a multiplication - that doesn't
      trigger UB on negative numbers.
      e50672cf
  15. Mar 16, 2018
  16. Mar 03, 2018
  17. Mar 02, 2018
  18. Feb 27, 2018
    • Brandl, Matthäus (MBR)'s avatar
      Enables usage of override specifier for MSVC compilers (beginning with 17.0... · b8d1d07a
      Brandl, Matthäus (MBR) authored
      Enables usage of override specifier for MSVC compilers (beginning with 17.0 which is the compiler of Visual Studio 2012)
      b8d1d07a
    • Arseny Kapoulkine's avatar
      Fix Texas Instruments compiler warning · b127cfb1
      Arseny Kapoulkine authored
      Texas Instruments compiler produces this warning for unused template
      member functions:
      
      	"pugixml.cpp", line 253: warning #179-D: function
      	"pugi::impl::<unnamed>::auto_deleter<T>::release [with
      	T=pugi::impl::<unnamed>::xml_stream_chunk<char>]" was declared but
      	never referenced
      
      As far as I can tell, this is a compiler issue - these functions should
      not be instantiated in the first place; while it's possible to rework
      the code to work around this, the changes would be fragile. It seems
      best to just disable this warning - we've seen something similar on SNC
      (which appears to use the same frontend!..).
      
      Fixes #182.
      b127cfb1
  19. Feb 22, 2018
    • Arseny Kapoulkine's avatar
      Work around gcc issues with limits.h not defining LLONG_MIN · 2ec3579f
      Arseny Kapoulkine authored
      It looks like there are several cases where this might happen:
      
      - In some MinGW distributions, the LLONG_MIN/etc defines are guarded
      with:
      
      	#if !defined(__STRICT_ANSI__) && defined(__GNUC__)
      
      Which means that you don't get them in strict ANSI mode. The previous
      workaround was specifically targeted towards this.
      
      - In some GCC distributions (notably GCC 6.3.0 in some configurations),
      LLONG_MIN/etc. defines are guarded with:
      
      	#if (defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L)
      
      But __STDC_VERSION__ isn't defined as C99 even if you use -std=c++14 -
      which is probably technically valid, but not useful.
      
      To work around this, redefine the symbols whenever we are building with
      GCC and we need them and they aren't defined - doing this is better than
      not building. Instead of hard-coding the constants, use GCC-specific
      __LONG_LONG_MAX__ to compute them.
      
      Fixes #181.
      2ec3579f
  20. Jan 08, 2018
  21. Nov 14, 2017
  22. Nov 13, 2017
    • Arseny Kapoulkine's avatar
      Fix -Wshadow warning · 3860b507
      Arseny Kapoulkine authored
      3860b507
    • Arseny Kapoulkine's avatar
      Implement correct move error handling for compact mode · 4bd8771c
      Arseny Kapoulkine authored
      In compact mode, we currently can not support zero-allocation moves
      since some pointer assignments required during the move need to allocate
      hash table slots.
      
      This is mostly applicable to xml_document_struct::first_child, since the
      pointer to this element is used as a hash table key, but there are some
      contrived cases where parents of root's children need a hash slot and
      didn't have it before.
      
      These cases can be fixed by changing the compact encoding to be a bit
      more move friendly, but for now it's easier to handle the error and
      throw/return during move.
      
      When this happens, the source document doesn't change.
      4bd8771c
    • Arseny Kapoulkine's avatar
      Add count argument to compact_hash_table::rehash/reserve · 91a3c288
      Arseny Kapoulkine authored
      This allows us to do a single reserve for a known amount of assignments
      that is larger than the default minimum per reserve (16).
      91a3c288
  23. Oct 21, 2017
    • Arseny Kapoulkine's avatar
      Clarify a note about compact hash behavior during move · 3af93a39
      Arseny Kapoulkine authored
      After move some nodes in the hash table can have keys that point to
      other; this makes the table somewhat larger but this does not impact
      correctness.
      
      The reason is that for us to access a key in the hash table, there
      should be a compact_pointer/string object with the state indicating that
      it is stored in a hash table, and with the address matching the key. For
      this to happen, we had to have put this object into this state which
      would mean that we'd overwrite the hash entry with the new, correct
      value.
      
      When nodes/pages are being removed, we do not clean up keys from the
      hash table - it's safe for the same reason, and thus move doesn't
      introduce additional contracts here.
      3af93a39
  24. Sep 26, 2017
    • Arseny Kapoulkine's avatar
      Fix -Wshadow warning · febf25d1
      Arseny Kapoulkine authored
      febf25d1
    • Arseny Kapoulkine's avatar
      Implement move support for xml_document · a567f12d
      Arseny Kapoulkine authored
      This change implements the initial version of move construction and
      assignment support for documents.
      
      When moving a document to another document, we always make sure move
      target is in "clean" state (empty document), and proceed by relocating
      all structures in the most efficient way possible.
      
      Complications arise from the fact that the root (document) node is
      embedded into xml_document object, so all pointers to it have to change;
      this includes parent pointers of all first-level children as well as
      allocator pointers in all memory pages and previous pointer in the first
      on-heap memory page.
      
      Additionally, compact mode makes everything even more complicated
      because some of the pointers we need to update are stored in the hash
      table (in fact, document first_child pointer is very likely to be there;
      some parent pointers in first-level children will be using
      compact_shared_parent but some won't be) which requires allocating a new
      hash table which can fail.
      
      Some details of this process are not fully fleshed out, especially for
      compact mode; and this definitely requires many tests.
      a567f12d
  25. Jul 18, 2017
    • Arseny Kapoulkine's avatar
      Fix Clang/C2 compatibility · 77d7e603
      Arseny Kapoulkine authored
      Clang/C2 does not implement __builtin_expect; additionally we need to
      work around deprecation warnings for fopen by disabling them.
      77d7e603
  26. Jun 23, 2017
  27. Jun 22, 2017
  28. Jun 19, 2017
    • Arseny Kapoulkine's avatar
      Change PUGI__SNPRINTF to use _countof for MSVC · 208e2cf0
      Arseny Kapoulkine authored
      The macro only works correctly when the input argument is an array with
      a statically known size - pointers or arrays decayed to pointers won't
      work silently.
      
      While this is unlikely to surface issues that aren't caught in
      tests/code review, use _countof for MSVC to prevent such code from
      compiling.
      208e2cf0
  29. Jun 16, 2017
Loading