Split Uppercase words in Excel
Here's a worksheet function solution. It ain't pretty, but if you're totally averse to using VBA, then I think you're stuck with ugly options only. For text in
A1, paste the following into
B1 and press Ctrl+Shift+Enter to enter the formula as an array formula:
=IFERROR(INDEX(IF(CODE(MID(D1,ROW(INDIRECT("A2:A"&LEN(D1))),1))<=90,IF(CODE(MID(D1,ROW(INDIRECT("A2:A"&LEN(D1))),1))>=65,IF(MID(D1,ROW(INDIRECT("A1:A"&LEN(D1)-1)),1)<>" ",REPLACE(D1,ROW(INDIRECT("A2:A"&LEN(D1))),1," "&MID(D1,ROW(INDIRECT("A2:A"&LEN(D1))),1)),D1),D1),D1),MIN(IF(CODE(MID(D1,ROW(INDIRECT("A2:A"&LEN(D1))),1))<=90,IF(CODE(MID(D1,ROW(INDIRECT("A2:A"&LEN(D1))),1))>=65,IF(MID(D1,ROW(INDIRECT("A1:A"&LEN(D1)-1)),1)<>" ",ROW(INDIRECT("A1:A"&LEN(D1)-1)),2000000),2000000),2000000))),D1)
I told you it was ugly!
And for all that effort, that will only split the first and second name. For more splits, fill the formula over to the right. So for example, if you have a list of names in
A1:A10, and you think the most words in any name is four, you could enter the formula in
B1 (as an array formula!), fill down to
B10, then fill right to
E10. Your list of split names will be in
If you're inclined to jump down the rabbit hole, here's a brief explanation of what the formula does:
- Check each character to see if it is in the ASCII range for capital letters and not preceded by a space. The first character of the name is skipped.
- An array equal in size to the length of the string (minus 1) is populated as follows: If a match is found, the string is stored with the matching character replaced by a space preceding itself. If no match is found, the original string is stored.
- The first element from this array that corresponds to a match is returned. If no match is found, the original string is returned.
I would like to split all words in my cell by Uppercase, an example:
MikeJones RinaJonesJunior MichealSamuelsLurth
Mike Jones Rina Jones Junior Micheal Samuels Lurth
Can this be done without using VBA?
Calculating Molecular Weight Using Excel
While your request is marginally possible through some pretty complex (and CPU intensive) formulas using nothing but native Excel functions, a VBA User Defined Function or UDF would be vastly more appropriate. I'm not a chemist so please excuse the additions to your single sample I've provided as they were stolen shamelessly from an Internet page. TBH, I'm not even sure if I have half of the terminology correct.
Step 1 - Create a table of molecular weights and name it
You are going to require some form of cross-reference to retrieve the molecular weights from the element's periodic symbols. Here is what I scraped together. I'll supply a link to the full table of data in a sample workbook below.
With that on a worksheet named Element Data, go to
Formulas ► Defined Names ► Name Manger and give the cross-reference matrix a defined name.
Here I've used a formula (
=OFFSET('Element Data'!$A$1,0,0,COUNTA( 'Element Data'!$A:$A),6)) to define the range but the size of the data is fairly static so a cell range reference should be more than sufficient.
Step 2 - Add the code for a User Defined Function
Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to
Insert ► Module (Alt+I+M). Paste the following into the new pane titled something like Book1 - Module1 (Code).
Public Function udf_Molecular_Weight(sCMPND As String) As Double Dim sTMP As String, i As Long, sEL As String, sSB As String Dim dAW As Double, dAWEIGHT As Double, dSUB As Long sTMP = sCMPND: dAWEIGHT = 0: sSB = "0": sEL = vbNullString Do While CBool(Len(sTMP)) sSB = "0": sEL = vbNullString If Asc(Mid(sTMP, Application.Min(2, Len(sTMP)), 1)) > 96 Then sEL = Left(sTMP, 2) Else sEL = Left(sTMP, 1) End If sTMP = Right(sTMP, Len(sTMP) - Len(sEL)) Do While IsNumeric(Left(sTMP, 1)) sSB = sSB & Int(Left(sTMP, 1)) sTMP = Right(sTMP, Len(sTMP) - 1) Loop 'Debug.Print sEL & ":" & (Int(sSB) - (Not CBool(Int(sSB)))) dAWEIGHT = dAWEIGHT + Application.VLookup(sEL, ThisWorkbook.Names("tblPeriodic").RefersToRange, 6, False) * (Int(sSB) - (Not CBool(Int(sSB)))) Loop udf_Molecular_Weight = dAWEIGHT End Function Public Function udf_Styled_Formula_Alt(sCMPND As String) As String Dim sb As Long, sCOMPOUND As String sCOMPOUND = sCMPND For sb = 0 To 9 sCOMPOUND = Replace(sCOMPOUND, sb, ChrW(8320 + sb)) Next sb udf_Styled_Formula_Alt = sCOMPOUND End Function Public Function udf_Unstyled_Formula_Alt(sCMPND As String) As String Dim sb As Long, sCOMPOUND As String sCOMPOUND = sCMPND For sb = 0 To 9 sCOMPOUND = Replace(sCOMPOUND, ChrW(8320 + sb), sb) Next sb udf_Unstyled_Formula_Alt = sCOMPOUND End Function
Only the first of those is pertinent to your posted question. The latter two stylize the compound's chemical formula with Unicode subscript characters and reverse the process.
When you have completed the paste, tap Alt+Q to return to your worksheet. These UDF functions can now be used within your workbook just as any native Excel function can. The syntax is as simple as I could muster.
=udf_Molecular_Weight(<single cell with compound formula in plain text>)
For your sample compound (in the data image above) this would be,
With 9000+ of these, I suspect you'll use the former method. Fill down as necessary. While this UDF is vastly more efficient than convoluted array formulas using
INDIRECT and other native worksheet functions, they are not magic. Test the formula on a few hundred rows before committing to the 9000+ so you know what to expect. The other two UDFs work in much the same fashion should you choose to put them to use.
As promised, here is a link to the sample .XLSB workbook I created for this purpose for you to download and reference.
That link will remain active for a while. If I change its location to more permanent storage, I will adjust the link here.
By 'variable declarations', I'm guessing you actually mean 'variable assignments'. I tend to write fairly tight code and I've taken what others would put into up to 4 code lines into a single line by stacking the zeroing of the variables with a colon. I turn this,
sTMP = sCMPND dAWEIGHT = 0 sSB = "0" sEL = vbNullString
... into this,
sTMP = sCMPND: dAWEIGHT = 0: sSB = "0": sEL = vbNullString
One of the worst mistakes ever made in the IT industry was an accountant's decision to pay programmers for every line of code they wrote.
The variables need to be reset before reentering the loops but it's a mundane task so I simply cram all four assignments into a single line.
Do While ... Loop crawl through the string that was passed into the function character by character. The inner loop deals exclusively with numbers. Each pass through the loop truncates the string from the left, shortening it by one or more characters and collecting those characters as either the symbol of a element or the number associated with its use in the organic compound. Eventually there is nothing left to truncate (length=0) and that is where
CBool(Len(sTMP)) becomes False and the loop ends. The inner loop performs much the same way but collects numeric digits until it reaches no length or an alphabetic character. After an element (and a possible numeric modifier) has been collected, the molecular weight for that element within the compound is calculated with a
VLOOKUP against the molecular weight table and added to a growing number. When all elements and their associated number has been gathered and added into the grand total, the total is returned as the result of the function.
Cleaning bad data in excel, splitting words by capital letters
My function from Split Uppercase words in Excel needs udpdating for your additional string matching.
One assumption your cleansing does make is people have only two names, so
Ben OliverCarlos Alberto
is broken to
is that actually what should happen? (needs a minor tweak if so)
Function SplitCaps(strIn As String) As String Dim objRegex As Object Set objRegex = CreateObject("vbscript.regexp") With objRegex .Global = True .Pattern = "([a-z])([A-Z])" SplitCaps = Replace(.Replace(strIn, "$1, $2"), "<br>", ", ") End With End Function