While building an AS3 Editor i faced some performance issues which was only able to be solved by applying many optimizing techniques like : buffering, caching, string indexing, binary search, lazy loading, dirty flags … This note will give a brief an insights onto the problems and the techniques used.
As a common text editor approach we might do :
- Split the texts into lines save into an Array, say, rawArr.
- Apply format for each item save into another Array, say, formattedArr
- Compose a string by concatening formatted lines of visible lines once the view changes
- Apply Threading in step 1 + 2 to prevent player hang up if the source file is large.
- When user edit a line, change rawArr, then apply format to that item and save in formattedArr then update the view.
Although this approach seems intuitive, we faced two big performance issues :
- It’s so CPU intensive – although using Threading, we need to wait for first time parsing which might be more than seconds for large files. This, although not a big problem will frustrate user in the long run as they may need to open / close multiple files all the time.
- It costs much memory – Memory is more than being doubled caused we need to save two version : the formated one for rendering, and the raw for editing. This although seems to be harmless, but think of the case when we have many big files open at the same time, keeping the memory as low as possible is a must.
Let’s think a little bit more :
- Why should we need to parse all the text right away as user only see a part of the text a time ? – Why not just parse the text on demand ? parse just lines user can see and we got instant parsing experience with much lower memory foot print (we don’t need a formatted array) – This is also just half the needed memory.
- Instantly parsing about 100 visible lines although is not a big processing problem but might sometimes can hang the player up if the text line is long. We actually can work more effective by using caching to keep track of lasted 100 lines of formatted text. This way, on scrolling the view down / up slowly, we only need to parse one or some more new line at a time and by doing that we decrease the needed-to-be-parsed lines from about 100 to around 5. This is a huge processing improvement !
- Searching for the EOL character in the whole source text and put into raw Array might be time consuming, although it’s lure to use Threading to solve this, it’s not an effective approach.
String indexing technique is used to boost the performance this final case. We firstly need to split the raw source into smaller chunks, which we called text blocks :
- A text block will be snapped to the EOL / EOF character that contains roughly ~ 100 lines x 100 chars per line.
- A text block will have 2 important int values : 1 for lines count (nLines) and 1 for first line position (stLine)
- The first block of text will be parsed instantly and available for user to edit. A thread will run through all the text blocks to count the number of lines each block contains (nLines) and update stLine for each block and the total lines available of the editor. The raw string data in text block will be nullize after migrating into raw array. We actually can update ~ 10 blocks per frame that means a 100k lines of source will be indexed in just 10 frames !
- As user scroll down the available lines (indexed lines), we can apply binary search on stLine of blocks to quickly find the blocks in charge and parse that block then get the needed lines (actually as we choosed to have about 100 lines per block, we only need to parse maximum 2 blocks a time)
- Parsing a block involving break the original source into lines at EOL positions and apply format for the lines then save the details into a corresponding block detail instance as the amount of detail information is large and not needed to available all the time, we use cache to manage these information to keep memory foot print as low as possible (we only save from 2-4 blocks details for better performance and memory). The block detail will contains only formatted array of lines.
- We need to use a time stamp in both text block and block detail to ensure that the cached version is always up to date, this only useful in the “Find and replace in the whole source” case, where we won’t need to show up the formatted version (we only update text blocks and visible block details).
- As user edit a line, only that raw line (in the text block) and corresponding formatted line (in block detail) being updated. When user insert or remove a line it will be affected the current block but the stLine index for other blocks is changed and we need to update them all. But we won’t update it right away, there’s no rush for it. We use an integer dirty flag to show the clean position (the lines before this flag value is clean – that means their stLine is correct), and another int stLineOffset to keep track of the number of lines changed (can be both negative or positive).
- We need to update the view for each line changes (add lines, remove lines) by modifying the text block and the corresponding block detail. When updates the view, we only need to clean up flag value to the end of visible lines. When user scroll down, the dirty stLine will be updated along the way.
More problematic than that, when considering support contexted based highlighter (AS3 will have some contexts : inline XML, inline String, comment, block comment, regexp ….) we came across multiple lines contexts that may be accidentally broken apart into two or more blocks, then we need to perform additional checks (usually by using regexp or matching start / end ) and save an endContext variable. If the context is not yet ended it should be non null. A block will be parse based on the previous text block’s context.
This seems to be quite a bit complex but it’s really worth the efforts. As a side note, using the text block will enabled user to edit and expand the number of lines in each block. We can also re parse the whole text when there is enough changes to make the blocks dirty (like has more than 4 empty blocks, or has a block that has nLines more than 200 …) and user interaction is idled or not on focus for some time to keep the editing process as smooth as possible.